Patients take care of what their teeth will be like after the orthodontics. Orthodontists usually describe the expectation movement based on the original smile images, which is unconvincing. The growth of deep-learning generative models change this situation. It can visualize the outcome of orthodontic treatment and help patients foresee their future teeth and facial appearance. While previous studies mainly focus on 2D or 3D virtual treatment outcome (VTO) at a profile level, the problem of simulating treatment outcome at a frontal facial image is poorly explored. In this paper, we build an efficient and accurate system for simulating virtual teeth alignment effects in a frontal facial image. Our system takes a frontal face image of a patient with visible malpositioned teeth and the patient's 3D scanned teeth model as input, and progressively generates the visual results of the patient's teeth given the specific orthodontics planning steps from the doctor (i.e., the specification of translations and rotations of individual tooth). We design a multi-modal encoder-decoder based generative model to synthesize identity-preserving frontal facial images with aligned teeth. In addition, the original image color information is used to optimize the orthodontic outcomes, making the results more natural. We conduct extensive qualitative and clinical experiments and also a pilot study to validate our method.
translated by 谷歌翻译
With increasing scale, large language models demonstrate both quantitative improvement and new qualitative capabilities, especially as zero-shot learners, like GPT-3. However, these results rely heavily on delicate prompt design and large computation. In this work, we explore whether the strong zero-shot ability could be achieved at a smaller model scale without any external supervised data. To achieve this goal, we revisit masked language modeling and present a geometry-guided self-supervised learning method (Go-tuningfor short) by taking a small number of task-aware self-supervised data to update language models further. Experiments show that Go-tuning can enable T5-small (80M) competitive zero-shot results compared with large language models, such as T5-XL (3B). We also apply Go-tuning on multi-task settings and develop a multi-task model, mgo-T5 (250M). It can reach the average performance of OPT (175B) on 9 datasets.
translated by 谷歌翻译
In this work, we study the black-box targeted attack problem from the model discrepancy perspective. On the theoretical side, we present a generalization error bound for black-box targeted attacks, which gives a rigorous theoretical analysis for guaranteeing the success of the attack. We reveal that the attack error on a target model mainly depends on empirical attack error on the substitute model and the maximum model discrepancy among substitute models. On the algorithmic side, we derive a new algorithm for black-box targeted attacks based on our theoretical analysis, in which we additionally minimize the maximum model discrepancy(M3D) of the substitute models when training the generator to generate adversarial examples. In this way, our model is capable of crafting highly transferable adversarial examples that are robust to the model variation, thus improving the success rate for attacking the black-box model. We conduct extensive experiments on the ImageNet dataset with different classification models, and our proposed approach outperforms existing state-of-the-art methods by a significant margin. Our codes will be released.
translated by 谷歌翻译
Breast cancer is one of the common cancers that endanger the health of women globally. Accurate target lesion segmentation is essential for early clinical intervention and postoperative follow-up. Recently, many convolutional neural networks (CNNs) have been proposed to segment breast tumors from ultrasound images. However, the complex ultrasound pattern and the variable tumor shape and size bring challenges to the accurate segmentation of the breast lesion. Motivated by the selective kernel convolution, we introduce an enhanced selective kernel convolution for breast tumor segmentation, which integrates multiple feature map region representations and adaptively recalibrates the weights of these feature map regions from the channel and spatial dimensions. This region recalibration strategy enables the network to focus more on high-contributing region features and mitigate the perturbation of less useful regions. Finally, the enhanced selective kernel convolution is integrated into U-net with deep supervision constraints to adaptively capture the robust representation of breast tumors. Extensive experiments with twelve state-of-the-art deep learning segmentation methods on three public breast ultrasound datasets demonstrate that our method has a more competitive segmentation performance in breast ultrasound images.
translated by 谷歌翻译
在本文中,我们研究了神经视频压缩(NVC)中位分配的问题。首先,我们揭示了最近声称是最佳的位分配方法实际上是由于其实施而是最佳的。具体而言,我们发现其亚典型性在于半损坏的变异推理(SAVI)对潜在的不正确的应用,具有非物质变异后验。然后,我们表明,在非因素潜伏期上校正的SAVI校正版本需要递归地通过梯度上升应用后传播,这是我们得出校正后的最佳位分配算法的。由于校正位分配的计算不可行性,我们设计了有效的近似值以使其实用。经验结果表明,我们提出的校正显着改善了R-D性能和比特率误差的错误分配,并且比所有其他位分配方法都大大提高了。源代码在补充材料中提供。
translated by 谷歌翻译
自我训练在半监督学习中表现出巨大的潜力。它的核心思想是使用在标记数据上学习的模型来生成未标记样本的伪标签,然后自我教学。为了获得有效的监督,主动尝试通常会采用动量老师进行伪标签的预测,但要观察确认偏见问题,在这种情况下,错误的预测可能会提供错误的监督信号并在培训过程中积累。这种缺点的主要原因是,现行的自我训练框架充当以前的知识指导当前状态,因为老师仅与过去的学生更新。为了减轻这个问题,我们提出了一种新颖的自我训练策略,该策略使模型可以从未来学习。具体而言,在每个培训步骤中,我们都会首先优化学生(即,在不将其应用于模型权重的情况下缓存梯度),然后用虚拟未来的学生更新老师,最后要求老师为伪标记生产伪标签目前的学生作为指导。这样,我们设法提高了伪标签的质量,从而提高了性能。我们还通过深入(FST-D)和广泛(FST-W)窥视未来,开发了我们未来自我训练(FST)框架的两个变体。将无监督的域自适应语义分割和半监督语义分割的任务作为实例,我们在广泛的环境下实验表明了我们方法的有效性和优越性。代码将公开可用。
translated by 谷歌翻译
受益于大规模预处理的视觉语言模型(VL-PMS),视觉问答的性能(VQA)已开始接近人类的甲骨文表现。但是,对VQA数据有限的大规模VL-PM的固定通常面临过度拟合和泛化问题,从而导致缺乏健壮性。在本文中,我们旨在提高VQA系统的鲁棒性(即,当系统对VQA的VL-PMS进行验证时,从信息瓶颈的角度来看,系统能够防御投入变化和人类对抗攻击的能力)。通常,通过VL-PMS获得的内部表示不可避免地包含有关下游VQA任务的无关和冗余信息,从而导致统计上的虚假相关性和对输入变化的不敏感性。为了鼓励表示形式收敛到视觉学习中的足够统计量,我们提出了相关信息瓶颈(CIB)原则,该原则通过最大程度地减少投入和内部表示之间的相互信息(MI)来寻求表示压缩和冗余之间的权衡。同时最大化输出和表示之间的MI。同时,CIB通过对称的关节MI估计来测量视觉和语言输入和表示之间的内部相关性。对五个VQA的投入鲁棒性和两个VQA基准的大量实验证明了拟议CIB在改善VQA系统鲁棒性方面的有效性和优越性。
translated by 谷歌翻译
病理学家需要结合不同染色病理切片的信息,以获得准确的诊断结果。可变形图像配准是融合多模式病理切片的必要技术。本文提出了一个基于混合特征的基于特征的可变形图像登记框架,用于染色的病理样品。我们首先提取密集的特征点,并通过两个深度学习功能网络执行匹配点。然后,为了进一步减少虚假匹配,提出了一种结合隔离森林统计模型和局部仿射校正模型的异常检测方法。最后,插值方法基于上述匹配点生成用于病理图像注册的DVF。我们在非刚性组织学图像注册(ANHIR)挑战的数据集上评估了我们的方法,该挑战与IEEE ISBI 2019会议共同组织。我们的技术的表现使传统方法的平均水平注册目标误差(RTRE)达到0.0034。所提出的方法实现了最先进的性能,并在评估测试数据集时将其排名1。提出的基于特征的混合特征的注册方法可能会成为病理图像注册的可靠方法。
translated by 谷歌翻译
我们提出了一种流动引导的变压器,该变压器创新地利用光学流体暴露的运动差异来指导变压器中的注意力检索,以进行高保真视频介绍。更特别地,我们设计了一个新颖的流程完成网络,以通过利用当地时间窗口中的相关流量来完成损坏的流。有了完整的流,我们将内容传播到视频框架上,并采用流引导的变压器来合成其余的损坏区域。我们将变压器沿时间和空间尺寸解开,因此我们可以轻松地集成本地相关的完整流量以仅指导空间注意力。此外,我们设计了一个流蛋白模块,以精确控制完整的流对每个空间变压器的影响。为了效率,我们将窗口分区策略引入空间和颞变压器。尤其是在空间变压器中,我们设计了双重透视空间MHSA,该空间MHSA将全局令牌集成到基于窗口的注意力上。广泛的实验证明了该方法在定性和定量上的有效性。代码可在https://github.com/hitachinsk/fgt上找到。
translated by 谷歌翻译
对比度学习(CL)最近已应用于对抗性学习任务。这种实践将对抗样本视为实例的其他积极观点,并且通过彼此达成最大的协议,可以产生更好的对抗性鲁棒性。但是,由于对抗性扰动可能会导致实例级别的身份混乱,因此这种机制可能存在缺陷,这可能会通过用单独的身份将不同的实例聚集在一起来阻碍CL性能。为了解决这个问题,我们建议在形成鲜明对比时不平等地对待对抗样本,与不对称的Infonce目标($ a-Infonce $)允许区分对抗样本的考虑。具体而言,对手被视为降低的阳性,会引起较弱的学习信号,或者是与其他负面样本形成较高对比的艰难负面因素。以不对称的方式,可以有效地减轻CL和对抗性学习之间相互冲突目标的不利影响。实验表明,我们的方法始终超过不同鉴定方案的现有对抗性CL方法,而无需额外的计算成本。提出的A-INFONCE也是一种通用形式,可以很容易地扩展到其他CL方法。代码可从https://github.com/yqy2001/a-infonce获得。
translated by 谷歌翻译